-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-17884 Regenerate AddPopulationData, ConvertLanguageData, reduce standard out noise #3965
CLDR-17884 Regenerate AddPopulationData, ConvertLanguageData, reduce standard out noise #3965
Conversation
java -jar tools/cldr-code/target/cldr-code.jar AddPopulationData
CLDR-17884 Check alternate country names without parentheses too CLDR-17884 Regex match add country note CLDR-17884 Remove world bank aggregates
Looks great! I want to review it more tomorrow, but a couple of quick notes. tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv
|
This would mean adjusting comment lines at the top with the warning and adding tabs to the ends of lines that don't currently have references. |
We can do that -- but there is already a plan to break up this file into smaller pieces to get rid of the redundant country information too -- I'll look into that over the next few weeks. Followup pull requests I'll modernize the documentation into markdown files and add better tests. If you like the idea -- I was thinking of adding a checksum attribute for generated sections like territoryInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help
CLDR-17844
I started this ticket because I was seeing a lot of noisy warnings and errors in the regular tests -- I ended up in a rabbit hole with the generated population data. This stack of commits updates the data inputs and fixes errors in the scripts so we can regenerate population data in a stable way now.
Scripts ran:
Script output changes
A lot of the script Standard Out messages mentioned in the original ticket are now fixed and will not appear -- mostly from fixing input data sources and a few processing scripts. If there are legitimate errors in the future the warnings and errors will appropriately come back.
world_bank_data.csv
are now gone, there are no more warnings about aggregates without country codes, eg. "Sub-Saharan Africa (all income levels)`Data changed:
country_language_population.tsv
Cantonese (Traditional) yue
row otherwiseyue
would disappear in the re-generatedsupplementalData.xml
-- introduced in CLDR-17871 Create yue_Hant_CN stub locale #3945factbook_gdp_ppp.csv
&factbook_gdp_ppp.csv
: CIA Factbook data updated and imported using the csv that's exported by the CIA's website -- see also the old CLDR update documentation.supplementalData.xml
other_country_data.txt
other_country_data.txt
: Added information that used to be in earlier versions of the CIA Factbookworld_bank_data.csv
: Re-generated from the World Bank Website . See also the old CLDR update documentation.alternate_country_names.txt
: Removed no longer needed skipped names since we no longer import CIA Factbook aggregatesConsequences for
supplementalData.xml
<language>
territories tag should be the territories where the language is official -- so some entries updated. For instance Mocheno was incorrectly considered an official language of Italy in CLDR-17430 add mhn Mocheno locale #3665ALLOW_MANY_COMMITS=true